AITopics | iteration time

Collaborating Authors

iteration time

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Training Foundation Models on a Full-Stack AMD Platform: Compute, Networking, and System Design

Anthony, Quentin, Tokpanov, Yury, Szot, Skyler, Rajagopal, Srivatsan, Medepalli, Praneeth, Golubeva, Anna, Shyam, Vasu, Washbourne, Robert, Iyer, Rishi, Chaurasia, Ansh, Figliolia, Tomas, Yang, Xiao, Sarje, Abhinav, Thorstensen, Drew, Pearson, Amartey, Grossbart, Zack, van Patten, Jason, Barsoum, Emad, Gu, Zhenyu, Fu, Yao, Millidge, Beren

arXiv.org Artificial IntelligenceDec-5-2025

We report on the first large-scale mixture-of-experts (MoE) pretraining study on pure AMD hardware, utilizing both MI300X GPUs and Pollara networking. We distill practical guidance for both systems and model design. On the systems side, we deliver a comprehensive cluster and networking characterization: microbenchmarks for all core collectives (all-reduce, reduce-scatter, all-gather, broadcast) across message sizes and GPU counts over Pollara. To our knowledge, this is the first at this scale. We further provide MI300X microbenchmarks on kernel sizing and memory bandwidth to inform model design. On the modeling side, we introduce and apply MI300X-aware transformer sizing rules for attention and MLP blocks and justify MoE widths that jointly optimize training throughput and inference latency. We describe our training stack in depth, including often-ignored utilities such as fault-tolerance and checkpoint-reshaping, as well as detailed information on our training recipe. We also provide a preview of our model architecture and base model - ZAYA1 (760M active, 8.3B total parameters MoE, available at https://huggingface.co/Zyphra/ZAYA1-base) - which will be further improved upon in forthcoming papers. ZAYA1-base achieves performance comparable to leading base models such as Qwen3-4B and Gemma3-12B at its scale and larger, and outperforms models including Llama-3-8B and OLMoE across reasoning, mathematics, and coding benchmarks. Together, these results demonstrate that the AMD hardware, network, and software stack are mature and optimized enough for competitive large-scale pretraining.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2511.17127

Country: North America > United States (0.28)

Genre: Research Report > New Finding (0.66)

Industry: Information Technology (0.94)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Communications > Networks (0.95)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.89)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.88)

Add feedback

SMART: A Surrogate Model for Predicting Application Runtime in Dragonfly Systems

Wang, Xin, Rizzini, Pietro Lodi, Medya, Sourav, Lan, Zhiling

arXiv.org Artificial IntelligenceNov-17-2025

The Dragonfly network, with its high-radix and low-diameter structure, is a leading interconnect in high-performance computing. A major challenge is workload interference on shared network links. Parallel discrete event simulation (PDES) is commonly used to analyze workload interference. However, high-fidelity PDES is computationally expensive, making it impractical for large-scale or real-time scenarios. Hybrid simulation that incorporates data-driven surrogate models offers a promising alternative, especially for forecasting application runtime, a task complicated by the dynamic behavior of network traffic. We present \ourmodel, a surrogate model that combines graph neural networks (GNNs) and large language models (LLMs) to capture both spatial and temporal patterns from port level router data. \ourmodel outperforms existing statistical and machine learning baselines, enabling accurate runtime prediction and supporting efficient hybrid simulation of Dragonfly networks.

iteration time, large language model, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2511.11111

Country: North America > United States (1.00)

Genre: Research Report > New Finding (1.00)

Industry:

Information Technology (0.46)
Government > Regional Government (0.46)

Technology:

Information Technology > Communications > Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.76)

Add feedback

1c6a0198177bfcc9bd93f6aab94aad3c-AuthorFeedback.pdf

Neural Information Processing SystemsOct-2-2025, 07:06:44 GMT

artificial intelligence, machine learning, supplementary material, (16 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.34)

Add feedback

PolyServe: Efficient Multi-SLO Serving at Scale

Zhu, Kan, Shi, Haiyang, Xu, Le, Shan, Jiaxin, Krishnamurthy, Arvind, Kasikci, Baris, Xie, Liguang

arXiv.org Artificial IntelligenceJul-25-2025

Advances in Large Language Models (LLMs) have led to a surge of LLM-powered applications. These applications have diverse token-generation latency requirements. As a result, simply classifying workloads as latency-sensitive (LS) or best-effort (BE) overlooks the nuances within the latency-sensitive category and results in suboptimal user experiences and scheduling opportunities. However, efficiently serving requests with multiple SLO requirements poses significant challenges. First, all requests within a batch generate new tokens simultaneously, which can misalign them with their distinct SLO requirements. Moreover, while existing systems focus on auto-scaling for handling various overall request rates, the diversity of SLOs necessitates fine-grained auto-scaling among these SLO tiers. Finally, unlike LS/BE scenarios, where BE requests can be aborted at any time to ensure the SLO attainment of LS requests, those with different latency-sensitive SLOs cannot tolerate prolonged delays, and tail latency must be controlled. To tackle these challenges, we propose PolyServe, a novel multi-SLO scheduling policy at scale that maintains high SLO attainment while maximizing throughput. PolyServe first groups requests into multiple bins based on their per-token latency requirement, then schedules each bin to a subset of the server fleet. PolyServe routes requests to the highest-load but still SLO-attainable server to create a load gradient that facilitates auto-scaling. To increase utilization, PolyServe permits looser-SLO requests to share tighter-SLO instances when their own servers are saturated. PolyServe uses profiling data to guide scheduling decisions and manage tail latency through request-wait-time-aware scheduling, dynamic chunking, and continuous chunked prefill prediction. PolyServe achieves 1.23x goodput gain compared to existing policies, achieving up to 92.5% of optimal goodput.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2507.17769

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Exploring Explainable Multi-player MCTS-minimax Hybrids in Board Game Using Process Mining

Qian, Yiyu, Miller, Tim, Qian, Zheng, Zhao, Liyuan

arXiv.org Artificial IntelligenceMar-30-2025

Monte-Carlo Tree Search (MCTS) is a family of sampling-based search algorithms widely used for online planning in sequential decision-making domains and at the heart of many recent advances in artificial intelligence. Understanding the behavior of MCTS agents is difficult for developers and users due to the frequently large and complex search trees that result from the simulation of many possible futures, their evaluations, and their relationships. This paper presents our ongoing investigation into potential explanations for the decision-making and behavior of MCTS. A weakness of MCTS is that it constructs a highly selective tree and, as a result, can miss crucial moves and fall into tactical traps. Full-width minimax search constitutes the solution. We integrate shallow minimax search into the rollout phase of multi-player MCTS and use process mining technique to explain agents' strategies in 3v3 checkers.

algorithm, artificial intelligence, fitness, (15 more...)

arXiv.org Artificial Intelligence

2503.23326

Country:

North America > United States > Georgia > Fulton County > Atlanta (0.04)
Europe > Ireland > Leinster > County Dublin > Dublin (0.04)
Asia > China > Hong Kong (0.04)

Genre: Research Report > New Finding (0.68)

Industry: Leisure & Entertainment > Games (1.00)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)

Add feedback

Autonomous Robotic Radio Source Localization via a Novel Gaussian Mixture Filtering Approach

Kim, Sukkeun, Moon, Sangwoo, Petrunin, Ivan, Shin, Hyo-Sang, Khattak, Shehryar

arXiv.org Artificial IntelligenceMar-13-2025

This study proposes a new Gaussian Mixture Filter (GMF) to improve the estimation performance for the autonomous robotic radio signal source search and localization problem in unknown environments. The proposed filter is first tested with a benchmark numerical problem to validate the performance with other state-of-practice approaches such as Particle Gaussian Mixture (PGM) filters and Particle Filter (PF). Then the proposed approach is tested and compared against PF and PGM filters in real-world robotic field experiments to validate its impact for real-world robotic applications. The considered real-world scenarios have partial observability with the range-only measurement and uncertainty with the measurement model. The results show that the proposed filter can handle this partial observability effectively whilst showing improved performance compared to PF, reducing the computation requirements while demonstrating improved robustness over compared techniques.

covariance, gaussian distribution, pgm filter, (14 more...)

arXiv.org Artificial Intelligence

2503.10349

Country:

North America > United States > California (0.05)
Asia > South Korea > Daejeon > Daejeon (0.04)

Genre: Research Report > New Finding (0.49)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.49)

Add feedback

Optimizing Language Models for Grammatical Acceptability: A Comparative Study of Fine-Tuning Techniques

Ratan, Shobhit, Knight, Farley, Jerfel, Ghada, Ho, Sze Chung

arXiv.org Artificial IntelligenceJan-14-2025

This study explores the fine-tuning (FT) of the Open Pre-trained Transformer (OPT-125M) for grammatical acceptability tasks using the CoLA dataset. By comparing Vanilla-Fine-Tuning (VFT), Pattern-Based-Fine-Tuning (PBFT), and Parameter-Efficient Fine-Tuning techniques (PEFT) like Low-Rank Adaptation (LoRA), we demonstrate significant improvements in computational efficiency while maintaining high accuracy. Our experiments reveal that while VFT achieves the highest accuracy (81.2%), LoRA enhancing FT by reducing memory usage and iteration time by more than 50%, and increases accuracy in PBFT case. Context Distillation (CD), though computationally efficient, underperformed with accuracy around 31%. Our findings contribute to democratizing access to large language models (LLM) by reducing computational barriers.

accuracy, efficiency, epoch, (15 more...)

arXiv.org Artificial Intelligence

2501.07853

Genre: Research Report > New Finding (0.66)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Entity Extraction from High-Level Corruption Schemes via Large Language Models

Koletsis, Panagiotis, Gemos, Panagiotis-Konstantinos, Chronis, Christos, Varlamis, Iraklis, Efthymiou, Vasilis, Papadopoulos, Georgios Th.

arXiv.org Artificial IntelligenceNov-11-2024

The rise of financial crime that has been observed in recent years has created an increasing concern around the topic and many people, organizations and governments are more and more frequently trying to combat it. Despite the increase of interest in this area, there is a lack of specialized datasets that can be used to train and evaluate works that try to tackle those problems. This article proposes a new micro-benchmark dataset for algorithms and models that identify individuals and organizations, and their multiple writings, in news articles, and presents an approach that assists in its creation. Experimental efforts are also reported, using this dataset, to identify individuals and organizations in financial-crime-related articles using various low-billion parameter Large Language Models (LLMs). For these experiments, standard metrics (Accuracy, Precision, Recall, F1 Score) are reported and various prompt variants comprising the best practices of prompt engineering are tested. In addition, to address the problem of ambiguous entity mentions, a simple, yet effective LLM-based disambiguation method is proposed, ensuring that the evaluation aligns with reality. Finally, the proposed approach is compared against a widely used state-of-the-art open-source baseline, showing the superiority of the proposed method.

dataset, identification, prompt engineering, (14 more...)

arXiv.org Artificial Intelligence

2409.13704

Country:

Europe > Greece > Attica > Athens (0.05)
Asia > Pakistan (0.04)
Asia > India > NCT > New Delhi (0.04)
Asia > India > NCT > Delhi (0.04)

Genre:

Research Report (0.40)
Overview (0.34)

Industry:

Law Enforcement & Public Safety > Fraud (0.70)
Banking & Finance > Trading (0.46)
Law Enforcement & Public Safety > Crime Prevention & Enforcement (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.96)

Add feedback

Filters

Collaborating Authors

iteration time

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

RLCG___NIPS

Training Foundation Models on a Full-Stack AMD Platform: Compute, Networking, and System Design

SMART: A Surrogate Model for Predicting Application Runtime in Dragonfly Systems

1c6a0198177bfcc9bd93f6aab94aad3c-AuthorFeedback.pdf

RLCG___NIPS

PolyServe: Efficient Multi-SLO Serving at Scale

Exploring Explainable Multi-player MCTS-minimax Hybrids in Board Game Using Process Mining

Autonomous Robotic Radio Source Localization via a Novel Gaussian Mixture Filtering Approach

Optimizing Language Models for Grammatical Acceptability: A Comparative Study of Fine-Tuning Techniques

Entity Extraction from High-Level Corruption Schemes via Large Language Models